feat(pt): add descriptor name & parameter numbers output & gpu name (only for cuda) & Capitalise some infos#5140
feat(pt): add descriptor name & parameter numbers output & gpu name (only for cuda) & Capitalise some infos#5140OutisLi wants to merge 3 commits intodeepmodeling:masterfrom
Conversation
📝 WalkthroughWalkthroughAdds a private Trainer method Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @deepmd/pt/train/training.py:
- Around line 731-744: The get_descriptor_type function can raise
IndexError/AttributeError/KeyError when accessing model.atomic_model.models[0]
or calling serialize()["type"]; add a bounds check to ensure
model.atomic_model.models is a non-empty sequence before indexing, and wrap
calls to descriptor.serialize() and dict access to ["type"] in a try/except that
falls back to "UNKNOWN" (or returns the serialized type plus " (with ZBL)" on
success); update checks to verify dp_model has a descriptor attribute and that
its serialize() returns a mapping with a "type" key to avoid unhandled
exceptions.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
deepmd/pt/train/training.py
🧰 Additional context used
🧬 Code graph analysis (1)
deepmd/pt/train/training.py (2)
deepmd/pt/model/model/dp_model.py (1)
get_descriptor(52-54)deepmd/pt/model/atomic_model/dp_atomic_model.py (1)
serialize(168-180)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (42)
- GitHub Check: Agent
- GitHub Check: CodeQL analysis (python)
- GitHub Check: Test Python (9, 3.13)
- GitHub Check: Test Python (7, 3.13)
- GitHub Check: Test Python (1, 3.13)
- GitHub Check: Test Python (6, 3.10)
- GitHub Check: Test Python (12, 3.13)
- GitHub Check: Test Python (7, 3.10)
- GitHub Check: Test Python (11, 3.13)
- GitHub Check: Test Python (2, 3.10)
- GitHub Check: Test Python (11, 3.10)
- GitHub Check: Test Python (9, 3.10)
- GitHub Check: Test Python (5, 3.10)
- GitHub Check: Test Python (5, 3.13)
- GitHub Check: Test Python (10, 3.13)
- GitHub Check: Test Python (3, 3.13)
- GitHub Check: Test Python (6, 3.13)
- GitHub Check: Test Python (8, 3.10)
- GitHub Check: Test Python (12, 3.10)
- GitHub Check: Test Python (10, 3.10)
- GitHub Check: Test Python (8, 3.13)
- GitHub Check: Test Python (2, 3.13)
- GitHub Check: Test Python (4, 3.13)
- GitHub Check: Test Python (3, 3.10)
- GitHub Check: Test Python (4, 3.10)
- GitHub Check: Test Python (1, 3.10)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Test C++ (true, true, true, false)
- GitHub Check: Test C++ (true, false, false, true)
- GitHub Check: Test C++ (false, false, false, true)
- GitHub Check: Test C++ (false, true, true, false)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Analyze (python)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Build C++ (clang, clang)
🔇 Additional comments (3)
deepmd/pt/train/training.py (3)
724-726: LGTM!The placement and conditional execution on rank 0 are appropriate for logging model summary information after initialization.
746-748: Verify whether all parameters or only trainable parameters should be counted.The current implementation counts all parameters, including non-trainable ones. If the intent is to count only trainable parameters, consider filtering by
p.requires_grad.Alternative implementation for trainable parameters only
If trainable parameters are intended:
def count_parameters(model: Any) -> int: """Count the total number of trainable parameters.""" - return sum(p.numel() for p in model.parameters()) + return sum(p.numel() for p in model.parameters() if p.requires_grad)
750-761: LGTM!The logging logic correctly handles both single-task and multi-task models, formatting the output appropriately for each case.
There was a problem hiding this comment.
Pull request overview
This PR adds logging functionality to output model summary information during training initialization, specifically the descriptor type and parameter count. This provides useful diagnostic information about the model being trained.
Key changes:
- Added
_log_model_summary()method to theTrainerclass that logs descriptor type and total parameter count - Supports both single-task and multi-task training scenarios
- Handles both standard models and ZBL models with appropriate descriptor type detection
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _log_model_summary(self) -> None: | ||
| """Log model summary information including descriptor type and parameter count.""" | ||
|
|
||
| def get_descriptor_type(model: Any) -> str: | ||
| """Get the descriptor type name from model.""" | ||
| # Standard models have get_descriptor method | ||
| if hasattr(model, "get_descriptor"): | ||
| descriptor = model.get_descriptor() | ||
| return descriptor.serialize()["type"].upper() | ||
| # ZBL models: descriptor is in atomic_model.models[0] | ||
| if hasattr(model, "atomic_model") and hasattr(model.atomic_model, "models"): | ||
| dp_model = model.atomic_model.models[0] | ||
| if hasattr(dp_model, "descriptor"): | ||
| return ( | ||
| dp_model.descriptor.serialize()["type"].upper() + " (with ZBL)" | ||
| ) | ||
| return "UNKNOWN" | ||
|
|
||
| def count_parameters(model: Any) -> int: | ||
| """Count the total number of trainable parameters.""" | ||
| return sum(p.numel() for p in model.parameters()) | ||
|
|
||
| if not self.multi_task: | ||
| desc_type = get_descriptor_type(self.model) | ||
| num_params = count_parameters(self.model) | ||
| log.info(f"Descriptor: {desc_type}") | ||
| log.info(f"Model params: {num_params / 1e6:.3f} M") | ||
| else: | ||
| # For multi-task, log each model's info | ||
| for model_key in self.model_keys: | ||
| desc_type = get_descriptor_type(self.model[model_key]) | ||
| num_params = count_parameters(self.model[model_key]) | ||
| log.info(f"Descriptor [{model_key}]: {desc_type}") | ||
| log.info(f"Model params [{model_key}]: {num_params / 1e6:.3f} M") |
There was a problem hiding this comment.
The new _log_model_summary() method that logs descriptor type and parameter count lacks explicit test coverage. While existing training tests will execute this code path, consider adding a dedicated test to verify that the descriptor type is correctly detected for different model types (standard models, ZBL models, multi-task models) and that the parameter counting logic works as expected. This would help catch potential issues with model structure assumptions.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5140 +/- ##
=======================================
Coverage 81.94% 81.94%
=======================================
Files 712 712
Lines 72887 72915 +28
Branches 3616 3617 +1
=======================================
+ Hits 59725 59751 +26
- Misses 11998 12000 +2
Partials 1164 1164 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.